(Experimental) Add support to NTK RoPE scaling #118

Panchovix · 2023-06-29T22:26:36Z

This adds support for the new NTK RoPE scaling, mentioned in #115.

"According to this post, this is a method of rope scaling that result in less perplexity loss and a bigger possible scaling:
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/"

Adds the parameter "a", "alpha", which is used when loading a model with "-a"

Tested on 65B models at 4K context, with 48GB VRAM (2x24) using gs 16,20

Perplexity:
For tulu-30B-GPTQ (non-SuperHOT)

Perplexity at 2048 ctx (no compress_pos_emb, no alpha RoPE): 5.2153
Perplexity at 8192 ctx, compress_pos_emb = 4: 10.0813
Perplexity at 8192 ctx, alpha = 4: 5.3534
Perplexity at 8192 ctx, compress_pos_emb = 4, alpha = 4: 15.4406

For Tulu-30B-SuperHOT-8K-4bit-32g:

Perplexity at 8192 ctx, compress_pos_emb = 4: 5.8166
Perplexity at 8192 ctx, alpha = 4: 7.5073
Perplexity at 8192 ctx, compress_pos_emb = 4, alpha = 4: 6.0903

Note: For 8K ctx and above, I suggest to keep using SuperHOT.

Panchovix · 2023-06-30T20:47:14Z

Important update: Before, the alpha value wasn't being applied correctly. Now, it does it correctly, and thus, just by setting alpha for NTK RoPE scaling would be enough (without the need to set compress_pos_emb to the same value)

Also, added perplex on a test of a 30B model.

turboderp · 2023-07-01T12:34:43Z

I might refactor this a bit later, but it seems okay. I'll merge it as is for now.

fahadh4ilyas · 2023-07-15T03:52:02Z

Hi, I'm confused with the current code. It seems like compress_pos_emb is still used beside alpha value especially if we set compress_pos_emb not equal 1 to scale the value of t. But, dynamic scaling seems didn't do that. Is this intentional?

Panchovix · 2023-07-15T03:56:36Z

Hi, I'm confused with the current code. It seems like compress_pos_emb is still used beside alpha value especially if we set compress_pos_emb not equal 1 to scale the value of t. But, dynamic scaling seems didn't do that. Is this intentional?

If compress_pos_emb is set to 1, the rotatory embedding base is still set at 10000. (As if nothing changed)

Ideally you just want to set either compression or alpha, not both at the same time (for example, do not use compress 2 and alpha 2)

Also, this implementation of NTK is static RoPE scaling. Dynamic NTK Scaling isn't implemented yet on exllama (it depends of the context length when generating)

fahadh4ilyas · 2023-07-15T07:49:36Z

Oh thank you for the clarification. So the base value is static based on the alpha. correct? Then after that, we could generate with context more than default context size? If I check the graph from this reddit post, with alpha 4, I could generate with context size 5000 with perplexity not exploded just like the yellow line in the graph?

Panchovix · 2023-07-15T19:19:29Z

@fahadh4ilyas correct.

Panchovix added 2 commits June 29, 2023 18:21

(Experimental) Add support to NTK RoPE scaling

0ba5f34

(Experimental) Add support to NTK RoPE scaling

555ae29

Panchovix mentioned this pull request Jun 29, 2023

NTK RoPE scaling. #115

Closed

Panchovix added 3 commits June 30, 2023 16:38

Revert alpha value to be set manually

5db2f3a

Apply correctly the value when using -a

720dc6d

Add calculate rotary embedding base correctly on model_init.py

60c2abe

Remove obsolete code

ea0deeb

Midaychi mentioned this pull request Jul 1, 2023

NTK-based exllama support potential oobabooga/text-generation-webui#2948

Closed

turboderp merged commit 8229d55 into turboderp:master Jul 1, 2023

Panchovix mentioned this pull request Jul 1, 2023

Add Support for Static NTK RoPE scaling for exllama/exllama_hf oobabooga/text-generation-webui#2955

Merged

pseudotensor mentioned this pull request Aug 28, 2023

long context h2oai/h2ogpt#360

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Experimental) Add support to NTK RoPE scaling #118

(Experimental) Add support to NTK RoPE scaling #118

Panchovix commented Jun 29, 2023 •

edited

Loading

Panchovix commented Jun 30, 2023 •

edited

Loading

turboderp commented Jul 1, 2023

fahadh4ilyas commented Jul 15, 2023

Panchovix commented Jul 15, 2023 •

edited

Loading

fahadh4ilyas commented Jul 15, 2023

Panchovix commented Jul 15, 2023

(Experimental) Add support to NTK RoPE scaling #118

(Experimental) Add support to NTK RoPE scaling #118

Conversation

Panchovix commented Jun 29, 2023 • edited Loading

Panchovix commented Jun 30, 2023 • edited Loading

turboderp commented Jul 1, 2023

fahadh4ilyas commented Jul 15, 2023

Panchovix commented Jul 15, 2023 • edited Loading

fahadh4ilyas commented Jul 15, 2023

Panchovix commented Jul 15, 2023

Panchovix commented Jun 29, 2023 •

edited

Loading

Panchovix commented Jun 30, 2023 •

edited

Loading

Panchovix commented Jul 15, 2023 •

edited

Loading